1 



Eigen-Based Transceivers for the 
MIMO Broadcast Channel with 
Semi-Orthogonal User Selection 

Liang Sun, Student Member, IEEE and Matthew R. McKay, Member, IEEE 



Abstract 

This paper studies the sum rate performance of two low complexity eigenmode-based transmission 
techniques for the MIMO broadcast channel, employing greedy semi-orthogonal user selection (SUS). 
The first approach, termed ZFDPC-SUS, is based on zero-forcing dirty paper coding; the second approach, 
termed ZFBF-SUS, is based on zero-forcing beamforming. We first employ new analytical methods to 
prove that as the number of users K grows large, the ZFDPC-SUS approach can achieve the optimal sum 
rate scaling of the MIMO broadcast channel. We also prove that the average sum rates of both techniques 
converge to the average sum capacity of the MIMO broadcast channel for large K. In addition to the 
asymptotic analysis, we investigate the sum rates achieved by ZFDPC-SUS and ZFBF-SUS for finite K, 
and show that ZFDPC-SUS has significant performance advantages. Our results also provide key insights 
into the benefit of multiple receive antennas, and the effect of the SUS algorithm. In particular, we show 
that whilst multiple receive antennas only improves the asymptotic sum rate scaling via the second-order 
behavior of the multi-user diversity gain; for finite K, the benefit can be very significant. We also show 
the interesting result that the semi-orthogonality constraint imposed by SUS, whilst facilitating a very 
low complexity user selection procedure, asymptotically does not reduce the multi-user diversity gain in 
either first {\ogK) or second-order {\og\ogK) terms. 
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I. Introduction 

In the multiple-input multiple-output (MIMO) broadcast channel, the spatial multiplexing capabihty 
of multiple transmit antennas can be exploited to efficiently serve multiple users simultaneously, rather 
than trying to maximize the capacity of a single-user link. The capacity region of the MIMO broadcast 
channel has now been well-studied [1-5], and has been shown to be achieved through the use of multiple 
antenna dirty paper coding (DPC) [3]. Unfortunately, optimal DPC is a highly non-hnear technique 
involving joint optimization over a set of power-constrained covariance matrices, and is therefore too 
complex for practical implementation [4]. A reduced complexity sub-optimal DPC scheme, known as 
zero-forcing dirty paper coding (ZFDPC), was proposed for single-antenna users in [5], and generaUzed 
to multiple-antenna users in [6], which is based on a QR decomposition of the channel matrix. 

To further reduce complexity, linear processing schemes such as beamforming (BF) have also attracted 
a lot of attention. The zero-forcing beamforming (ZFBF) scheme was first introduced for single-antenna 
users in [5], and further modified in [7] and [8]. In [9], the concept of block-diagonalization was proposed 
for multiple-antenna users, which completely cancels the inter-user interference by employing a set of 
precoding matrices. One key limitation of these techniques is that, for ZFDPC and ZFBF, the maximum 
number of users that can be supported must be no more than the number of transmit antennas, whereas 
for block-diagonaUzation, the number of the transmit antennas must be larger than the aggregate number 
of receive antennas across all users. This is significant, since the number of users in practice can be large. 

When the number of users K is larger than the number of transmit antennas M, one must select 
a subset of users in the system. A common approach is to seek the subset of users which yields the 
maximum sum rate. The complexity of finding the optimal subset, however, can be prohibitively large, and 
to reduce complexity greedy algorithms are commonly employed (see e.g., [10-12]). A promising way 
to further reduce the complexity of user selection is to restrict the searching space of users by imposing 
some constraint on the channels of the selected users. Following this method, [13] proposed a semi- 
orthogonal user selection (SUS) algorithm which iteratively searches for users with nearly orthogonal 
channel directions ^ 

In this paper, we consider low complexity transmission and user selection techniques for the MIMO 
broadcast channel with multiple-antenna users. It is still not clear how much advantage can be gained 
by employing multiple-antennas at the user terminals. Some recent exceptions which deal with the 
multiple-antenna user scenario are presented in [14] and [15]. Particularly, [14] proposed a generalized 
G-ZFDPC approach, based on the idea of eigenmode transmission (eigen-beamforming). A hmitation of 

'More specifically, two complex vectors u and v, with unit norm, are said to be semi-orthogonal if [u^v]'^ < 5, where 5 is 
referred to as the semi-orthogonality parameter. 
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that approach is the relatively high complexity, since it requires numerical optimization of certain system 
parameters. In [15], a thresholding technique based on the channel singular values was proposed, and 
necessary and sufficient conditions were given to achieve the optimum sum capacity of DPC as K ^ oo. 
However, for that scheme, the optimal threshold must be computed by exhaustive search, and is once 
again quite complicated when the number of users is not small. 

In this paper, we investigate two low complexity eigen-beamforming-based transceiver structures for the 
MIMO broadcast channel with multiple-anteima users, combined with a greedy SUS algorithm. The first 
technique is a generalization the G-ZFDPC approach in [10] to account for multiple-antenna users and 
combine it with SUS. We refer to this technique as ZFDPC-SUS. The second technique is a generalization 
of the algorithm proposed in [13], which we refer to as ZFBF-SUS. For both techniques, we present an 
asymptotic performance analysis of the sum rate (as in [6, 13-17]) as the number of users grows large. 
In particular, by employing novel analytical techniques, we demonstrate that ZFDPC-SUS achieves the 
optimal sum capacity scaUng of the MIMO broadcast chaimel as the number of users grows large. In 
addition, we prove the more powerful result that the difference between the sum rate of ZFDPC-SUS 
and the sum capacity of the MIMO broadcast channel converges to zero. We also establish a similar 
result for ZFBF-SUS. In addition to the asymptotic analysis, we also investigate the sum rates achieved 
by ZFDPC-SUS and ZFBF-SUS for finite K, for high and low signal-to-noise ratios (SNR). Based on 
our analytical results, we estabUsh a number of important insights. For example, we demonstrate that by 
employing multiple-anteimas at the user terminals only affects the asymptotic sum rate scaling via the 
second-order behavior of the multi-user diversity gain. Thus, the improvement due to having multiple 
receive antennas at the terminals is much less than that of having multiple transmit antennas, which 
provides hnear capacity growth through spatial multiplexing gain. However, for finite K, we show that 
the performance improvement due to multiple receive anteimas can still be very significant. We also 
establish key insights into the design of the semi-orthogonaUty parameter used in the SUS algorithm. 
In particular, it has been claimed previously that the semi-orthogonality constraint will cause multi-user 
diversity gain reduction [13]. However, through our asymptotic analysis, we show that if some very mild 
conditions on the semi-orthogonality constraint are met, then the semi-orthogonality parameter does not 
reduce the multi-user diversity gain in either first or second order, for both ZFDPC-SUS and ZFBF-SUS. 
It seems that this conclusion caimot be established by using previous analytical methods for SUS [13]. 
Our analysis also leads to practical design guidelines for selecting the semi-orthogonahty parameter for 
finite numbers of users, in order to intelligently trade off complexity and performance. Our analysis also 
demonstrates that for finite values of K, ZFDPC-SUS can significantly outperform ZFBF-SUS. 
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II. Channel and System Model 

We consider a MIMO broadcast channel with M transmit antennas and K users, with K > M. User 
k is equipped with A^^ antennas. In a flat-fading environment, the baseband model of this system is 

Yfe = HfeS + life, l<k<K, (1) 

where yfe G C^>'^^ is the received signal vector of user k, Hfe G C^''^^ denotes the channel matrix 
from the transmitter to user k, s e C^^^ represents the transmit signal vector, designed to meet the 
total power constraint Tr(f {ss^}) < P, and G 0^*=^^ is white Gaussian noise with zero mean and 
covariance matrix 1^,^. Throughout the paper, we assume (as in [5, 13, 14, 18]) that (i) the channels of 
all users are subject to uncorrelated Rayleigh fading and, for simplicity, all users are homogeneous and 
experience statistically independent fading, (ii) the transmitter has perfect CSI of all downlink channels^, 
and (Mi) each user only has access to their own CSI, but not the CSI of the downhnk channels of the 
other users. 

The transmitter supports L < M simultaneous data streams, shared by at most L selected users (active 
users), which are indexed by 7r(i), i = 1, 2, • • • ,L. (Note that the specific user selection algorithm will 
be discussed in Section m.) The transmitted signal vector is represented as 

s = WP2X, (2) 

where x = [xi,X2,--- ,xl]'^ collects the zero-mean circularly symmetric complex Gaussian infor- 
mation signals for each of the L data streams, satisfying f{xx^} = I^,, P = diag{pi,p2, • " " :Pl} 
accounts for the power loading across the multiple streams, chosen to satisfy Yli=iPi < and 
W = [wi, W2, • • • , wl] G C^^^ represents the precoder matrix, with denoting the beamforming 
vector for the z-th stream (i.e. for user 7r(i)), normalized to satisfy ||w2|p = 1. Note that with this 
formulation, a given user may be assigned multiple data streams. 
From (2), the received signal vector for user k can be rewritten as 

yfe = HfeWP^x + nfe. (3) 

It is convenient to represent Hfe via its singular value decomposition (SVD) Hfe = UfeSfeV^, where Sfe 
is a A^fe X M diagonal matrix containing the singular values of Hfe in decreasing order along its main 
diagonal, and Ufe = [ufe,i, Ufe,2, • • • ,Ufe,Ar,] G C^'=x^'= and Vfe = [vfe,i, Vfe,i, • • • ,Vk,M] e C^x^^ are 

^This assumption is reasonable in time division duplex (TDD) systems, which allows the transmitter to employ reciprocity to 
estimate the downhnk channels. 
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unitary matrices with Ukj and j representing the left and right singular vectors corresponding to the 
j-th largest singular value y^^kj- 

To detect the data stream i, user 7r(i) left multiplies the received vector by u^(^i)^di ^ follows 

= \J K{i)A vf(i),<i, WP^x + n^(i)_rf^ , (4) 

where nT^{i),di = ^Tv{i) d-^'^ii) ~ ^-^(0) 1) is the effective additive white Gaussian noise after processing, 
and di denotes the eigen-mode index for stream i, chosen according to the selection procedure outlined 
in Section III. Collecting the processed signals (4) for each of the L data streams, we may write 

r = C^,dWP^x + n = A|_^S^,dWPtx + n, (5) 

where C.,r,d = [^^(i) di ^^(2) da ' ' ' ' ' ^^(l) dL ]^ composite channel matrix for the selected users and 
eigen-channel set with i-th row vector c^(i)^rf, = ^/K^i}A^n(z)A' " " t ^n(i),di ' ^n(2),d2 , n^(L),dJ^. 

In the next section, we will describe several transceiver structures, as well as a greedy method for 
selecting the set of active users tt = {7r(l), - - ,7r(L)} and the corresponding eigen-chaimels (active 
eigen-chaimels) d = {di, • • • , di}. 

III. Transceiver Structures and User Selection Algorithm 
A. Greedy Zero-Forcing Dirty Paper Coding Algorithm 

In this subsection, we present a transmission strategy which jointly combines ZF, DPC, and eigen- 
beamforming, along with a greedy low complexity SUS scheduhng algorithm. Henceforth, this strategy 
will be termed ZFDPC-SUS. To the best of our knowledge this scheme has not been considered before. 
We note, however, that it is an extension of the ZFDPC strategy considered in [5, 10, 18] to account for 
multiple receive antennas, and also a variation of the algorithm discussed briefly in [13, Sect. VIII]. 

Let = hTT^dQw^ denote the QR decomposition of S^^^d^ where h^^d is a L x L lower triangular 
matrix with {i,j)-th entry kj, and Q,r,d = [qfj ■ ■ ■ , Ql]^ is a L x M matrix with orthonormal rows (q^ 
denotes the i-th row vector). The transmit precoder matrix is chosen as 

W = Q^,. (6) 
Then, (5) yields a set of interference channels 

j<i 
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From (7), if i < j, there is no interference at receiver 7r(i) from data stream j. For i > j, the interference 
term Ylj<i y/Pj is precanceled at the transmitter by using DPC. Then, the output SNR at receiver 
7r(z) for data stream i is given by 

Q'K{i),di = Pil'K{i),di (8) 

where 7^(j),<i. = K{i),dA^ with /3j = 

Given the optimal user set tt and the corresponding eigen-chaimel set d, the sum rate has the form 

L 

i?ZFDPC-SUS = max V log2(l + Pi77r(i),(iJ- (9) 

To maximize (9), the power should be allocated according to the standard water-filling algorithm. 

Now consider the problem of selecting the optimal user set tt and corresponding eigen-mode index 
set d. These sets are chosen to maximize the sum rate, given by (9). When M < K, to find the 
optimal solution, one must apply an exhaustive search over all possible L, and for each L, over all 
possible sets of L subchannels taken from the set of Y^^=i niin{M, A^^} available eigen-channels spaimed 
by all K users. Thus, the total number of possible user and eigen-chaimel selection sets is given by 
(^^"' Further, since different orderings of a given set will yield different output SNRs, 

all permutations of a given set must also be considered. Clearly, the complexity associated with this 
exhaustive search is computationally prohibitive in practice, for all but small values of K. 

Here we consider a user and eigen-mode selection algorithm with significantly lower complexity, based 
on SUS. This algorithm, which was first presented in [13] in the context of ZFBF, iteratively selects a 
user-eigeimiode index pair by searching for a set of users with near orthogonal chaimel vectors, and is 
described as follows. Let ZY„ denote the candidate set at the n-th iteration. This set contains the indices 
of all users and the corresponding eigen-channels that have not been selected previously, and which have 
not been pruned in the previous iterations (i.e., they have satisfied the "semi-orthogonality criteria" in 
each of the previous iterations). Also, let Sn = {(7r(l), di), • • • , (7r(n), d„)} denote the set of indices of 
the selected users and the corresponding eigen-chaimels after the n-th iteration. 
ZFDPC-SUS (Algorithm 1) 

1) Initialization: 

Set n = 1 and Wi = {(A;,j)| k = l,2,--- ,K; j = 1,2,- ■ ■ ,rmn{Nk, M)}. 

Let 7fej(l) = Ajfcj. The transmitter selects the first user and eigen-chaimel pair as follows: 

(7r(l),di) = arg max 7jkj(l). (10) 

(k,j)eUi 

Set Si = {(7r(l),c?i)}, and define qi = vf^^^^^^. 
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2) While n < M, n ^ n + 1. 
Calculate candidate set as 

= {{k,j)\{k,j) eUn-l, 

ik,j) ^ (7r(n- l),d„_i),|v|^^. qf_ip < S} 

where (5 is a positive constant, termed the semi-orthogonality parameter, that is preset before the 
start of the selection procedure. 

If Un is empty, set n = n — 1 and go to step 3). Otherwise, for each {k,j) G Zi„, denote 

= <,qf> i = l,--- ,n-l (11) 

n—l 

= v£-5;?,qi (12) 

i=l 

lk,j{n) = Xkj II $k,j f • (13) 
Select the n-th active user and corresponding eigen-channel as follows: 

{{TT{n),dn)} = arg max 7fe,jH- (14) 

(fc,j)ew„ 

Set 

Sn =Sn-i U {(7r(n),d„)}, 

qn = tS^^^- (15) 

II S7r(n),<i„ II 

3) The transmitter informs the selected users of the indices of their selected eigen-channels; then 
performs DPC, beamforming, and water-filling power allocation, as discussed previously. 

Note that this procedure applies Gram-Schmidt orthogonalization to the ordered rows of (i, as described 
by (11), (12) and (15). As such, it also computes the required transmit precoding matrix in (6). 
Observe the following important relations. According to the QR decomposition of S^^d^ 

n-l 

and Inj = vf(„),^„qf , for j < n. With (12), 
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In addition, since ||v^(„)^_ 



2 _ 



1 and Qj, i = 1, • • • 



L are orthonormal, it can be easily shown that 



n 



Y^\l^j\^ = l, forn = l,2,--- 



L. 



(18) 



B. Zero-Forcing Beamforming Algorithm 

The ZFDPC approach described in the previous section has significantly lower complexity than full 
(capacity-achieving) DPC, however it is still a nonlinear processing strategy, due to the interference 
cancelation step. Thus, a common method for reducing complexity even further is to remove the inter- 
ference cancelation and employ hnear processing (linear beamforming). It is well-known, however, that 
estabhshing the optimal linear beamforming vectors is a very difficult non-convex optimization problem 
[19]. Instead, sub-optimal but simple linear processing schemes are usually adopted. Here we will study 
ZFBF which is one of the most popular linear strategies. Unless otherwise indicated, we will employ the 
same notational symbols as used in the previous sections. 

Let Cjj. ^ denote the Moore-Penrose inverse of the equivalent channel matrix C7r,d> i-c, C]^ ^ = 
di^T^^d^^ d)~^' define ci,...,Ci;, as the columns of C|j.^. For ZFBF, the precoding matrix 
W = [wi, . . . , wl] is constructed with the beamforming vectors Wi = p^, for i = 1, . . . , L. Note that 
this direct implementation of ZFBF requires the explicit computation of the Moore-Penrose inverse of 
the channel matrix in order to obtain the beamforming vectors. It has been shown in [18], however, that 
this direct calculation can be circumvented, thereby significantly reducing the computational complexity. 
To this end, it is convenient to rewrite the decomposition of C-^^d as C-^^d = d^Ti^^-K^, where 
A = diag{A7r(i),d, , • • • , K{L),dL ) and I^t,^, Qt,^ are defined as in Section IH-A. Letting T^^^a = L^,d = 
• ■ ■ )tL]> assuming that C^ ^ has full row rank, the Moore-Penrose inverse C^. ^ can be written as 



Note that calculating the inverse of ^ is trivial (since it is diagonal), whereas the inverse of L^^d can 
be computed using a simple iterative algorithm given in [18, Eq. 11]. 

For ZFBF, the decoded signal for data stream 7r(i) is easily shown to be given by 




(19) 




(20) 



with corresponding SNR 



6n{i),di 




(21) 
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For the given user set tt and the corresponding eigen-channel set d, the sum rate is given by 

L 

-RzFBF-sus = max log2(l + (22) 

where the optimal power allocation {pi\f^i is obtained, once again, by applying the waterfilling procedure. 

For ZFBF, we consider a user and eigen-channel selection algorithm based on SUS, following the 
same general procedure as in Algorithm 1. Note that SUS has previously been applied to ZFBF in 
[13]. This algorithm typically assumes that each user is equipped with a single receive anteima, however 
it extends easily to the multiple receive antenna scenario considered in this paper. One key difference 
between the algorithms in [11, 13, 18] are the specific methods employed for selecting the "best" user 
in Step 2 of the algorithm. More specifically, in [13], the same method was applied as in (14), whereas 
[11] applied a method based on selecting one user at each iteration that results in the largest sum rate 
when combined with previously selected users. Whilst the latter method can result in larger sum rate, 
here we will consider the former method for analytically tractabiUty. It has been shown, however, that 
the difference in sum rate between these two methods is minor [18]. 

IV. Sum Rate Analysis - Asymptotic K 

In this section, we investigate the average sum rate of each of the above transceiver structures. For 
tractability, we make the following assumptions throughout this section: 

(i) For each user, only the principal eigen-chaimel is considered. As such, we drop the indices for the 
selected eigen-channels (for example, we use 77r(i) instead of 77r(i),di)- 

(ii) The available power P is divided equally amongst the active users^. 

Clearly, the sum rate achieved under these two assumptions will serve as a lower bound to the maximum 
achievable sum rate. We will also assume that each user has N anteimas, and that there are L = M data 
streams. 

We will investigate the average sum rate of both scheme discussed in the previous section. We focus 
on establishing asymptotic results as ivT oo, whilst keeping SNR, M, and fixed. 

A. ZFDPC-SUS Scheme 

To analyze the sum rate of the ZFDPC-SUS system, we require the distribution of the output SNR C7r(n)' 
or alternatively the distribution of 7^(„). Let us first determine the distribution of 7fe(n), n = 1, • • • , M, 
where k is an arbitrary user selected from the candidate set Un- 

'Note that in practice tlie transmit power may be optimized (e.g., according to tlie water-filling strategy). In such cases, the 
power allocation depends on the instantaneous channel coefficients and thus changes at the fading rate of the channel, which 
makes the analysis intractable. 
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Starting with n = 1, 7fc(l), k = 1, . . . , K, are independent and identically distributed (i.i.d.), with 

7fc(l) = Afc,max (23) 

where max is the maximum eigenvalue of H^^^Hfe, whose probabiUty density function (p.d.f.) and 

cumulative distribution function (c.d.f.) are known in closed-form and are given as follows [20]: 

Lemma 1: Let H ~ CN N,M{^N,M-,'i-N®'i-M)- The matrix H^H is complex Wishart, whose maximum 
eigenvalue has p.d.f. 

P {p+q—2r)r 

/max(x) = 5^ a,,, x^e-^^ (24) 

r=l s=q—p 

and c.d.f. 

p {p+q—2r)r 

i^max(x) = 5^ 5^ ^7(s + l,r-x) (25) 

r=l s=q—p 

where p = min{M, N}, q = max{M, N}, as,r is a constant (dependent on M and A^) which can be 
computed using the simple numerical method in [21], and 7(-, •) is the lower incomplete gamma function. 

For n > 2, evaluating the distribution of 7fc(n), k G W„, is significantly more challenging. Particularly, 
the "max" operation (10) of Step 1 of the previous iteration (i.e., the (n — l)-th), and also the semi- 
orthogonality constraint imposed at Step 2 of the current iteration (i.e., the n-th) will make the exact 
distribution of the eigen-channel vectors in Un different from the distributions of the eigen-channel vectors 
\a Ui, I < n — 1. More specifically, for n > 2, the eigen-chaimels for users in the candidate set Un are 
no longer distributed according to the maximum eigen-chaimel of a complex Wishart matrix (i.e., for 
k G Un, Vfc is no longer an isotropically distributed unit vector on the complex unit sphere, and max 
is no longer distributed as the maximum eigenvalue of a complex Wishart matrix). 

We see from (13) that 7A;(n) involves the product of A^ max and the projection variable || p. For the 
reasons stated above, the exact distributions of both A^^max and || |p for G > 2 are currently 
unknown and appear very difficult to derive analytically. Fortunately, we can make progress by appeaUng 
to the "large-user" regime. In particular, when the number of users in the candidate set Un is large, 
the problem is greatly simplified by invoking the following key lemma, which shows that removing a 
finite number of users from Un has negligible impact on the statistical properties of the remaining users. 
Similar results have also been established previously for different system configurations [11, 13, 18]. 

Lemma 2: At the n-th iteration, 2 < n < M, conditioned on the previously selected eigen-chaimel 
vectors C7r(i), - ' )C^(n i)' the eigen-channel vectors in Un are i.i.d. Furthermore, as the size of the 
candidate user set Un grows large (i.e. linii^^oo |^n| = oo), conditioned on the previously selected 
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eigen-channels 0,^(1), • • • , c^(n-i)' the eigen-channel for each user in Un converges in distribution to the 
distribution of the principal eigen-channel of a complex Wishart matrix. 

Proof: See Appendix A. ■ 

Note that our result here differs from that of [18] in both the distribution of the chaimel vectors and 
also the user selection algorithm. 

Equipped with Lemma 2, at the n-th iteration, from the point of view of the users in Un, the eigen- 
channel vectors of the selected users in the previous iterations (i.e., Cjr(i),-'' )C^(n-i)) appear to be 
randomly selected. Thus, the orthonormal basis qi, • • • , Qn-i (generated from 0,^(1) , • " " > ^Trin-i)) appears 
independent of the eigen-chaimel vectors of the users in Un- This greatly simphfies the following analysis. 

We require the exact distribution of 7fc(n) = Afc,max || IP- To this end, the major challenge is to 
derive the c.d.f. of I3k{n) = for an arbitrary user k G Un, i.e. Fg(„)(x) = Pr(/3jfc(n) < x\ k £ Un)- 

Recalling that ln,j = v^^^^ ^ qj^ for j < n, with (17) and (18), we can re-express this c.d.f. as follows: 

F,(„)(x) = Pr (|v,^q^p < x | |v,^qf ^ < 5, • • • , |vf q^.p < 5) 



Pr J2 l^fe'qf I' ^ 1 - ^ 

\i=i 



|v;^qf|^<,5,--- ,|Vi^q^_i|^<(5 



- ' Pr(|vfqfP<5,---,|vfq^_,P<5) " ^'^^ 

The denominator, = Pr(|v|^qf^p < 5,--- ,|v|^q^_^p < 6), denotes the probability that any 

arbitrary user k e {1,--- ,K} will belong to the set Un- Note that this probability has also been considered 
in the context of ZFBF for the MIMO broadcast chaimel in [13], where a rather loose lower bound was 
derived. Here we derive an exact expression which appUes for large K, using an alternative derivation 
approach. For tractability, our result applies for 6 < j^rj^ which is easy to establish. 

Lemma 3: With sufficiently large K and 6 < the probability that an arbitrary user fc G {1, . . . , K} 
belongs to the set Un, for n G {2, • • • , M}, is given by 

H„Hi2 , e \,,H^H |2 



,jin{5) = Pr (|vf qf |2 < 5, . . . , |vf qli|2 ^ ^) 



M-l 



M - 1 



n-l 



1=0 



(27) 



k=n—l 

Proof: See Appendix B. ■ 

Note that the term "sufficiently large" in Lemma 3 implies that K should be large enough such that: 

ICn = \Un\- K^in{5) (28) 



due to the law of large numbers (LLN). In fact, this also places an additional requirement on S, which 
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must be selected such that as K — > oo, \Un\ becomes sufficiently large (e.g. such that limx-5>oo |^n| = oo). 
More specifically, since 5 < 1, by examining (28) and (27) and recalling the condition on (5 in the lemma 
statement, we can establish the following design criterion: S should be chosen such that 

1 



lim K6^-^ = oo and (5 < ^ 

K-^oo M - 1 



(29) 



1 

This implies that any 5 can be selected, as long as it does not approach zero at a rate of l/i^^-i or 
faster as if — > oo, whilst also meeting the technical condition S < jjiii. These are very mild conditions 
which are easy to satisfy (for example, choosing S to be any constant less than j^^i)- We further discuss 
the design impUcations of selecting S in Section IV-C. 

The numerator in (26) can be evaluated using similar methods, which leads to the following result: 
Lemma 4: Let k G Un, n G {2, • • • , M}, and assume 5 is chosen to satisfy (29). For sufficiently large 
K, the c.d.f. of Pk{n), given in (26), can be expressed as follows: 



0, 
1 - 



x<l-{n-l)5 



r(M) 

V{M-n+l)^l^(S) 



X k-. •••4(1- ElLi U)^-''^t, . . . dtn-1 , l-{n-l)S<x<l 

1 , X>1 



(30) 



where the integral region is given by U G 



0, min {(5, 1 - a; - Ej=i+i tj } 



For n = 2, (30) has the closed-form solution 



^/3(2)(a;) 



l-(l-5)"- 



x<l-S 
1-S<x<l 
X > 1 



(31) 



Proof: See Appendix C. ■ 
For arbitrary M and n, it is difficult to obtain an exact closed-form solution for this c.d.f. Based on the 
above lemma, however, we can derive closed-form upper and lower bounds, as given by the following: 
Lemma 5: The c.d.f. Fg(-„)(x), for n G {2, • • • ,M}, satisfies F^j^^-j{x) < < F^^^^{x), with 

^^(n)(^) ^^'^ P?{n){x) given by (32) and (33) 




1 
1 



X < 1 - (n - 

\- {n-\)b <x<\ 

x>\ 



(32) 
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and 



X < 1 - (n - 1)5 

1 _ 1 - (n - < X < 1 

1 X > 1 



(33) 



respectively, where /x„(-) is given by (27) and Ix{-, •) is the regularized incomplete beta function. 



Note that for n = 2, -F/3fc(n)(^) = ^^(„)(a^) = 



[X] 



Proof: See Appendix D. ■ 

Equipped with Lemma 5, and with the help of Lemma 1 , we may now derive upper and lower bounds 
on the c.d.f. of 7fe(ra). To establish this result, recall that for an arbitrary user k G Un, n > 2, then 
7fc(n) = Xk,miaPk{n). Also, define %{n) = \k,ms^Pk{n) and %{n) = Afe,max/3fe(n), with c.d.f.s Kj,(n)(a;) 
and F;y(„)(a;) respectively. 

Lemma 6: The c.d.f. F^(„)(a;), for n G {2, • • • ,M}, satisfies F;y(„)(x) < F^(„)(x) < F;y(„)(x), with 
^7(n)(^) ai^'i Fi{n){x) given by 

-, M-l /, , [n~l / / ■ \k 

^ , , ^ /X\ 1 (M-V 



Ain(5) 



E 

k=n—l 



k 



,1=0 



n- 1 



p (7V+M'-2r)r 

Yl Yl ""^'^ 

r=l s=q—p 



Y (^) r''-^-'-\-x)''-^ ^r{j -k + s + l,rx)-r(j-k + s + l, 

3=0 



rx\ 

T/J 



(34) 



p (A'+M-2r)r 



M-k-1 



M - /c - 1 



x(-a;) 



rx 



+ s - M + 2,rx) - T [j + s - M + 2,— 



„M-j-s-2 



(35) 



respectively, where -Fmax( )> P> 9 and 0^,5 are defined as in Lemma I, t = 1 — {n— 1)6 and r(-, •) denotes 
the upper incomplete gamma function. 

For the case n = 2, F^,(„)(.t) = = F^^^-j(x). 

Proof: See Appendix E. ■ 

Although not shown due to space limitations, these bounds have been confirmed through simulations. 

Recall that our primary aim is to characterize the distribution of C7r(n)' or equivalently 7,r(n) which, 
from (14), is the maximum of a collection of i.i.d. random variables chosen from Un, with common c.d.f. 
P')'{n)ix)- Moreover, as discussed previously, our main interest is the case where the number of users K, 
and consequently the size of Un, is large. As such, from the theory of extreme order statistics (see e.g. 
[14, Appendix I] [22]), the asymptotic distribution of the largest order statistic 77r(n) depends on the tail 
behavior (large x) of Ky(„)(x). For n > 2, the following closed-form asymptotic (high x) expansions for 
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the c.d.f. upper and lower bounds in (34) and (35) are derived in Appendix F: 



+ 0(e-^x^+^-"-2) (36) 
+ 0(e-^x^+^-"-2) (37) 



where 



1 Tin) 



En r(M-n + l)r(Ar)(n- 1)"-!' 

1 1 



(38) 
(39) 



en r(M - n + i)r(iv) ■ 

Based on the above results, we can establish upper and lower bounds of the asymptotic distribution 

of 7^(„), for large K. To this end, define 7^(„) = maxfegj^^ % (n) and %(^ri) = T^^k&u„lk{n), with 
c.d.f.s F-y^,„)(.x) and Fy^^^^^{x) respectively. It is clear that F;y^^^^{x) < F^^^^^{x) < Fj^^^^{x), where the 
equalities hold when n = 1. Then, we have the following lemma: 
Lemma 7: The random variables %(^n) %{n)' n e {2, - ■ ■ , M}, satisfy 

Pr{u„ - log log VK < <Un + log log VK} 

>l-o(-^), (40) 



logi^: 

Pr{Xn - log log Vk < 7^(„) <Xn + log log VK} 



.(K\ ...... , iK 



where"* 

= log ( — ) + (M + AT - n - 1) loglog I — ) , (42) 

Xn = log + (M + AT - n - 1) log log . (43) 

Proof: This result is readily established by combining (36) and (37) with the extreme order statistics 
result given in^ [14, Lemma 7]. ■ 



'^Here log(-) represents the natural logarithm. 

'Note that there are some minor typographical errors with [14, Lemma 7]. Here we have adopted the correct results. 
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For the case n = 1, 77r(n) = ln(n) = 77r(n)' whose asymptotic distribution is [14] 

Pr{ui - log log \/K < 7^(1) < ui + log log 

Interestingly, we can obtain the same result if we substitute n = 1 into (40)-(43). The asymptotic 
distribution of C7r(n) follows from the above results. 
Lemma 8: Let p= jj- For C7r(n)' {!,■ ■ ■ , M}, we have 



Pr{tX7„ - p log log VK < C7r(„) <Vn + p log log V if} 



>l-0(-^Y (45) 



where 



ro„ = plog ( — ) + p{M + N-n-l) log log ( — ) , (46) 

Vn = plog (^^^ + p{M + AT - n - 1) log log . (47) 

Proof: See Appendix G. ■ 
We can now prove the following theorem (see Appendix H), which presents a key contribution: 
Theorem 1: For a fixed number of transmit antennas M and receive antennas A'^, and fixed transmit 

power P, if the semi-orthogonality parameter 5 is chosen to satisfy (29), then the sum rate -Rzfdpc-sus 
of the proposed ZFDPC-SUS scheme satisfies 

,. -RZFDPC-SUS . 

K^oo M\og2\p\ogK] 

with probabihty 1, where p = P/M. In addition, 

lim £{R^c} - f {i?ZFDPC-sus} = 0, (49) 

K-^oo 

where i?Bc denotes the sum rate of the MIMO broadcast chaimel, achieved with DPC. As if — > oo, the 
average sum rate difference between ZFDPC-SUS and DPC is no greater than O ( ^°fog°|;^ ) ■ 

Note that the sum rate difference convergence (49) is much stronger than the sum rate ratio convergence 
in probabihty (48), since the latter does not preclude the existence of an infinite sum rate gap between 
the proposed scheme and the optimal scheme. 
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B. ZFBF-SUS Scheme 

In this section, we will evaluate the performance of Unear ZFBF with SUS. For our analysis, following 
[13], we will assume that the criterion (14) is used at each iteration of the SUS algorithm to select the 
best user. In [13], it has been proved that ZFBF-SUS can achieve the same asymptotic sum rate scaling 
as DPC. Here we establish the stronger result that the average sum rate of ZFBF-SUS converges to 
the average sum rate achieved with optimal DPC, which was not established in [13]. Deriving an exact 
expression for the asymptotic distribution of the output SNR for each data stream, analogous to (45), 
appears very difficult for ZFBF-SUS. Thus, here we adopt a different approach, based on first applying 
an upper bound which relates the output SNR of ZFBF-SUS in terms of the output SNR of ZFDPC-SUS, 
and then applying results from the previous subsection. This leads to the following key theorem: 

Theorem 2: For a fixed number of transmit antennas M and receive antennas N, and fixed transmit 
power P, if the semi-orthogonahty parameter S is chosen to satisfy (29), then the sum rate £^{i?zFBF-sus} 
of the ZFBF-SUS scheme satisfies: 

lim S{Rbc} - £{RzFBF-svs} = . (50) 

As K ^ oo, the average sum rate difference between ZFBF-SUS and DPC is no greater than O ( ^°f^°%^ ) • 
Proof: See Appendix I. ■ 
This result shows that, as for the ZFDPC-SUS scheme, we can significantly reduce the complexity of 
the SUS search algorithm by choosing S reasonably small, whilst at the same time achieve the optimal 
asymptotic sum rate of DPC. 

C. Discussion of Results 

Based on the analysis above, some interesting observations are readily in order. 

1) Asymptotically, both schemes can achieve the maximum spatial multiplexing gain of M, and also 
the maximum multi-user diversity gain up to first order (i.e. the SNR scales with log K, and the 
sum rate scales as log log iC). For ZFBF, this scahng behavior agrees with previous results [15, 18]. 

2) As shown in Theorem 1 and Theorem 2, provided that the semi-orthogonaUty parameter S is selected 
appropriately, the asymptotic ergodic sum rates of both schemes converge to that of the MIMO 
broadcast channel, and in both cases the difference in average sum rate with respect to optimal 
DPC is no greater than O ^ '°io|^x ^ ^ • Note that similar scaling results have also been obtained for 
other user selection schemes with ZFBF [15, 18]. 

3) In contrast to most related work, our results provide key insights into the effect of the SUS semi- 
orthogonality parameter 5 and the number of receive antennas N. Considering ZFDPC-SUS, from 
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(45) and the expressions for ro„ in (46) and Vn in (47), we see that imposing the constraint 6 does 
not reduce the multi-user diversity gain in both first order terms 0(log K) and second-order terms 
O (log log K). It appears that this result can not be estabUshed based on previous (less accurate) 
SUS analysis methods [13]. Moreover, our analysis demonstrates that whilst the first order terms 
0(log K) in the multi-user diversity gain are unaffected by the number of receive antennas N, the 
second-order term grows linearly with both N and M. This is consistent with a similar conclusion 
made in [14], which considered a different system configuration. 

4) We can also draw insights into the design of S. For practical systems with finite numbers of 
users, obtaining the exact S which yields the optimal complexity-performance tradeoff remains 
a challenging open problem. However, our asymptotic analysis still provides guidance for the 
implementation of practical SUS algorithms. In particular, we see that the choice of S is closely 
related to K and M and, to miiumize complexity, it is clearly desirable to select 6 to decrease 
with increasing K. At the same time, however, for finite numbers of users it is advisable to 
"overcompensate" and select S to easily meet the conditions in (29). In our numerical experiments, 
we found that for systems with M < 8, the choice of d = can work well. In addition, since the 
number of candidate users decreases with each iteration of the SUS algorithm, further complexity 
savings can be achieved by adaptively selecting 6; e.g., at iteration n, setting dn = j^^j^ | . 

5) Although the results in Section IV-A and IV-B demonstrate that both the ZFDPC-SUS and ZFBF- 
SUS schemes achieve the same asymptotic average sum rate, the speed of convergence to this 
optimal sum rate can be very different. Intuitively, this performance difference is caused by a 
reduction in the effective channel gain [13] seen by the ZFBF receivers. Thus, for finite K, there 
will be a gap in the average sum rates of the two schemes. We will now study this more closely. 

V. Sum Rate Analysis - Finite K 

In this section, we analyze the achievable sum rates of the ZFDPC-SUS and ZFBF-SUS schemes for 
finite numbers of users. To obtain clear insights, we focus on the high and low SNR regimes. Our analysis 
is based on studying the gap between the sum rates achieved by the two transceivers and a fixed upper 
bound. This study follows the method of [23], which considered single-user MIMO receivers. We will 
first evaluate the performance for a given set of chaimel realizations, and then investigate the average 
performance via simulations. We make the same assumptions as stated at the begiiming of Section IV. 

Given a set of M users tt determined by user selection^, the sum capacity of the MIMO broadcast chan- 

*For a meaningful comparison, we will assume that for bothi scliemes, tlie same SUS selection criteria is used, based on (14). 
As such, the active users sets and the corresponding compound channel matrix C,r,d will be the same for both schemes. 
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nel {H^(fc)}^^ can be written by using the duahty of the MIMO broadcast channel and the MIMO mul- 
tiple access channel as [4] CBc{{H-n{k)}^=i, P) = max5^^trQ,<p log2 det ^1 + Hf(^)QfcH^(fc)^ . 
Since no closed-form solution exists, it is very difficult to compare CBc{{^n{k)}if=ii P) with -Rzfdpc-sus 
and -RzFBF-sus- In fact, even with our assumption of equal power allocation, i.e. = j^i, this problem 
is still difficult, due to the complicated structure of the compound channel matrix 0,^,^ for the ZFDPC and 
ZFBF schemes (see (5)). Thus, to analyze the difference in sum rate between i?zFDPC-sus and -Rzfbf-sus 
for finite K, we adopt an indirect approach and focus on characterizing the differences between the 
sum rates achieved by the two transceiver structures and C, where C = log2 det(lM + P^-k^^^ d) 
p = P/M. 

Before presenting our main results, it is worth noting that [5, Theorem 3] limp^oo C'Bc(C,r,d) P) — C = 
0, where CBc(C7r,(i, P) denotes the sum capacity of a MIMO broadcast system given by (5). Moreover, 
for the case = 1, {H^(jt)};^i reduces to C^,^ and CBc({H^(jk)};^i, P") coincides with CBc(C;r,d, P)- 
Thus, the high SNR results which we estabhsh below correspond precisely to the gaps between the sum 
rates achieved by the two transceivers and the sum capacity achieved with optimal DPC. Define 

'-1 12 ^ U. .12 

where Zjj and tjj are the (i, j)-th elements of matrices L^ ,^ and T^Ti,d^ respectively. Some basic manip- 
ulations of the results in [23] yield the following theorem: 

Theorem 3: For finite number of users K, finite number of transmit and receive antennas M and N, 

• In the high SNR region: 

M 



C - iilzFDPC-sus = —r-T^Yl 



/>log2 ^ K{i)\k,i\^ 
+0{p-^), (52) 



M 



C - i?ZFBF-SUS = ^ log2 ( 1 -I- ) 

i=l 

+0(p-2). (53) 



In the low SNR region: 



M 

C — -RzFDPC-SUS = -, / ^ Vi'^-rrii) \h ' 

log 2 ^-^ ^ ' 



i=l 



+0(p'), (54) 
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P ^ 1 
C — -RzFBF-SUS = 7, / ^(1 +111— T~; ) 

t=l 

X A^(i)|/i,i|2 + 0(p2). (55) 
From these results, we can make the following conclusions. 

High SNR Region: As p ^ oo, for ZFDPC-SUS the sum rate approaches C, whereas for ZFBF-SUS 
there is a constant sum rate gap of ^ = Yli^i log2(l+'«i)- This gap can be zero only when Ki = 0, which 
is a rare case corresponding to complete orthogonahty between the row vectors of Cj^^d- Subtracting (54) 
from (55), in this region we can also quantify the sum rate gap between ZFDPC-SUS and ZFBF-SUS 
as -RzFDPC-sus — ^ZFBF-SUS = -4 + 0{p^^), which shows the advantage of ZFDPC-SUS for finite K. 

Low SNR Region: As p ^ 0, for both ZFDPC-SUS and ZFBF-SUS, the sum rate gaps w.r.t. C 
approach zero hnearly with p. Moreover, in this region we can again quantify the sum rate gap as 
-RzFDPC-sus - -RzFBF-sus = Y^iiii^ " T^JK{i)\k,i\^' which is non-negative. It is also worth noting 
that in the low SNR regime, better performance may be achievable by transmitting with full power to only 
a single user, rather than sending equal power streams to M selected users. The benefit of this approach, 
however, will depend not only on the SNR value, but also on the number of users K. In particular, the 
benefit of this approach is expected to be most evident when K is small, for which case there will be 
the most disparity between the dominant eigen-channels of the users. 

Effect of SUS Parameter S: According to the SUS algorithm, we have < S for i > j, and 

l^i > 1 — (i — 1)6. Thus, with smaller semi-orthogonality parameter 6, it is more likely to have 
off-diagonal elements with smaller absolute value in both hj^^d and T^^^d (i-e smaller < j and 

\tj^i\,i < j ) and more hkely to have diagonal elements with larger absolute value in L^,^. From (51), 
these observations imply that a smaller S leads to smaller rji and Kj. In addition, it is easy to see that 
ili\k,i\'^ = Z]}=i Kijp ^^'^ (1 + = 1- With these results, we see that by decreasing 5, the sum 

rate gaps for both transceivers are likely to decrease, for both high and low SNRs. This impUes that the 
sum rates of both transceivers are likely to increase, which agrees with intuition. 

Fig. 1 demonstrates the average sum rate gaps of ZFDPC-SUS and ZFBF-SUS for different SNRs. 
Results are shown for M = 4, N = 4, K = 50, and 5 = j^^- These results confirm our analytical 
conclusions given above, based on Theorem 3. 

VI. Numerical Results 

For our simulations, we use P = 15 dB, 6 = i^^^ , and the optimal water-filling power allocation. 
Fig. 2 plots the average sum rate achieved by ZFDPC-SUS and ZFBF-SUS as a function of the number 
of users. Curves are also presented for ZFBF with complete search, as well as optimal DPC. In the first 
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Fig. 1. Comparison of sum rate gap for different SNRs. M = 4, N = 4, K = 50. 



case, a search is conducted over all combinations of users, and the combination with the highest sum rate 
is selected. Due to the very high complexity of this approach, we only provide results for relatively small 
K. The optimal DPC curve acts as an achievable upper bound, and is computed using the algorithm 
from [24]. In addition, based on (98) and the expressions for Un in (42) and Xn in (43), we have plotted 
Ylif^i log2(l + p(log K + (M + N — i — 1) log log K)) as an asymptotic approximation for the average 
sum rate of the ZFDPC-SUS scheme. As evident from the figure, the performance of ZFDPC-SUS is very 
close to that of DPC, and is slowly converging to DPC as K grows large. The asymptotic approximation 
for ZFDPC-SUS based on our analysis is also quite good (within 1 bps/Hz). Considering ZFBF, we see 
that the ZFBF-SUS curve is no more than 0.5 dB away from that of the complete search method; further 
verifying the utiUty of the SUS approach. Moreover, the ZFBF curves are far below the ZFDPC-SUS 
curve, demonstrating that ZFDPC-SUS has significant performance advantages at finite K. For further 
comparison, we have also implemented a related algorithm proposed in [15] and plotted the corresponding 
sum rate curve. This curve is generated by using an optimal threshold, computed by an exhaustive search. 
The performance is close to that of ZFBF-SUS. 

Fig. 3 compares the average sum rate of ZFDPC-SUS and ZFBF-SUS as a function of the number of 
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Fig. 2. Comparison of average sum rates for different numbers of users. M = 4, = 4, P = 15 dB. 



users, for different numbers of receive antennas. Note that according to (98) and the expressions for Un 
and Xn in (42) and (43) respectively, if we increase the number of receive antennas by one, the increase 
in sum rate can be approximated as M log ^1 + ^ as ^ oo; i.e., the difference in sum 

rate will be negligible for large K. However, the figure shows that this convergence is very slow, and 
that increasing the number of receive antennas can significantly increase the sum rate for finite K. 

VII. Conclusion 

We have investigated the sum rate of two low complexity eigenmode-based transmission techniques for 
the MIMO broadcast channel, ZFDPC-SUS and ZFBF-SUS. We proved that ZFDPC-SUS can achieve 
the optimal sum rate scaUng of the MIMO broadcast channel, and that the average sum rate of both 
techniques converges to the average sum capacity of the MIMO broadcast channel as K grows large 
(albeit at different rates). We also investigated and compared the achievable sum rates of ZFDPC-SUS 
and ZFBF-SUS for finite K, and demonstrated that ZFDPC-SUS has significant performance advantages. 
In contrast to most previous related results, our analytical results provide important insights into the 
benefit of multiple receive antennas, and the effect of the SUS algorithm. 
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Fig. 3. Comparison of average sum rates for different numbers of users and different numbers of receive 
antennas. M = 4, P = 15 dB. 

Appendix A 
Proof of Lemma 2 

Our derivation closely follows the method of proof for [18, Lemma 3] and [25, Lemma 1]. For two 
complex vectors z = + jZi and z' = z'^ + jz'^ with the same dimension, we write z ^ z' if every 
element of z^ and Zj is less than or equal to its counterpart in z^ and z^, respectively. Let /C„ denote 
the cardinaHty of the candidate set Un- For the first iteration, Ki = K and c^(i) is the vector with 
the maximum norm. For clarity of exposition, at the end of n-th iteration, we relabel the eigen-channel 
vectors in W„/{7r(n)} as Ci, • • • , ca:„-i. 

We find that the result in [25, Lemma 1], which was derived specifically for Gaussian vectors, holds 
more generally and does not require the Gaussian assumption, and indeed can also be adapted to our 
case. The proof is based on induction. For the first iteration, we have 

Prjci ^ Zi, • • • ,CK-1 ^ Zx_i|c7r(l) = Z(i)} 

K-1 

= Y[Pr{c^^z^\\\ci\\ <\\z^^)\\} (56) 
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and since Huik^oo = oo, 

lim Pr{ci ^ Zj|||cj|| < ||z(iJ} = Fc(zj), (57) 

where -Fc( ) is the c.d.f. of the principal eigen-vector of a complex Wishart matrix. 

Now assume that this lemma holds up to the (n — l)-th iteration and let us consider the n-th iteration. 
Conditioned on 0,^(1), ■ ■ ■ , C7r(„_i), according to our assumption, the channel vectors in Un are i.i.d. and 
converge in distribution to the principal eigen-vector of a complex Wishart matrix. At the end of step 
3) of the n-th iteration, user 7r(n) is chosen. Any user k in lAn satisfies 7fe(n) < 7,r(n)- Replacing the 
condition'' {c^(i) = Z(i)} and {||ci|| < ||z(i)||} by {c^(i) = Z(i),--- ,c^(„_i) = Z(„_i),c^(„) = Z(„)} 
and {c^[X) = Z(i) , • ' ' ^ ^-niii-X) = Z(„_i) , 7fc(n) < 77r(n)} respectively in the derivation in [25, Lemma 1] 
and following the same method as in [25, Lemma 1], we can estabhsh that the remaining channel vectors 
in Un are i.i.d. with c.d.f. 

Pr{cj < Zj|c^(i) = Z(i), • • • , 

C7r(n-1) = Z(„_i), 7fc(?^) < 77r(n)} (58) 

for z = 1, . . . , /C„ — 1. Since lim^-^^oo = oo> 1-K{n) is unbounded from above, i.e., 

lim 77r(n) = oo, (59) 

and we have 

lim Pr {ci < Zi|c^(i) = Z(i), • • • , 

C7r(n-1) = Z(n-l),7fe('^) < 77r(n)} 
= Pr {Ci < Zj|c^(i) = Z(i),- • • ,C^(„_i) = Z(„_i)}. 

(60) 

By induction Pr {cj r< Zj|c7r(i) = Z(i), • • • , c^(n_i) = Z(„_i)} converges in distribution to the distribution 
of the principal eigen-vector of a complex Wishart matrix, thereby estabUshing the lemma. 

^To be more precise, we note that different notation is used in [18]. Our conditions {c^(i) = Z(i), • • • , c^(„) = Z(„)} and 
{c,r(i) = Z(i), • • • ,c^{n-i) = Z{n-i),7fc(") < 77r(n)} are onalogous to the conditions {h^^^ = Z(i), • • • .h^^^^ = Z(„)} and 
{hj(i) =Z(i),--- ,hj(„_i) = Z(„_i),it^^) (hi) < i?^^) (z(„))} givenin [18]. 
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Appendix B 
Proof of Lemma 3 

According to Lemma 2, the eigen-vector v^, for k G Un, is an isotropically distributed unit vector 
on the M-dimensional complex unit hypersphere. In addition, for large K, the subspace spanned by the 
orthonormal basis qi,--- ,qn-i becomes independent of v^. Thus, without loss of generality we can 
assume = e^, where is the z-th row of the identity matrix Im- Let = [vi, ■ ■ ■ ,vm]'^, then 

l,n{S) = Pr(|vfqf|2<<5,... ,|v|^q^_i|2<<5) 

= PT{\vif <S,--- ,\Vn-lf<6). (61) 

In the following we will first derive the joint p.d.f. of \vi\^, • • • , 

The surface area of a complex unit hypersphere of M dimensions is [26]. So the joint p.d.f. of 
vi, - ■ ■ , VM can be written as: 



/(Vfe) = f{vi,--- ,vm) = < 



r(M) II II _ . 

-2Pri l|Vfc|| — i 



(62) 



0, otherwise 

Define Vi = X2i-i + jX2i. Then, the joint p.d.f. of xi, • • • , X2m can be expressed as: 

r(M) ^2M 2 _ 1 

0, otherwise 
We require the joint p.d.f. of xi, • • • , X2(„_i), which is evaluated via 



f{xi,X2, ■ ■ ■ ,X2m) = < 



(63) 



f{xi,--- ,X2(„-l)) 

= / ••• / fixi,--- ,X2m) 

X dX2(n-l)+l • • • dX2M 

r(M) 

= -^:^y{xu--- ,X2^n-l)) (64) 



where F(xi, • • • , X2(n-i)) denotes the area 



V{xi,--- ,X2(n-l)) 

= / • • • / dX2(n-l)+l • • • dX2M 



Z^i = 2(.i-1) + 1 -^i Z^»=l ■<-> 

X dx2(„_i)+i • • • dx2M • (65) 
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The multi-dimensional integral (65) is seen to be the surface area of a real (2M — 2(n — l))-dimensional 
hypersphere of radius y^l — Y^^=]^^^ Thus, using results from [26], we evaluate this integral as 
follows: 

V{xi,--- ,X2(n_i)) 

2(M-n-+l)-l 



2(n-l) 
i=l 



1- E 



r(M - n + 1) 

xVdet A dxi ■ ■ ■ da:;2(n-i)j (66) 

where A is a (2(n - 1) + 1) x (2(n - 1) + 1) matrix with (i, j)-th element Ajj = ■§§-■ §^ with 
= ^xi, • • • , a;2(n_i), ^1 — X^^l""^^ icf ^ , and '•' denotes the vector iimer product operation. We can 
compute Aj j = <^z,j+ ^_£tim ^,2 ^ where is the Kronecker-delta function, and after some manipulations 
obtain det A = — 2(^-1) — j- Combining this result with (64) and (66) we obtain 

f( ^ - r(M) 



2(n— 1) 



X 



1 - . (67) 



i=l 

It is now convenient to make the polar coordinate transformations X2i-i = riCOsOi, X2i = rjsin^j, for 
i = I, ■ ■ ■ ,n — I, where > 0, < < 27r. The corresponding Jacobian is easily evaluated as [26] 
(nr=i^ . So the joint density of ri, • • • , r„_i is 

f{ri,--- ,rn-i) 

mi A-v,.A"""rrV 

r(M - n + l)7r"-i I ^ ' 1 J-V 



=1 / i=l 



n-1 „27r 

X 



2"-T(M) 
r(M - n + 1) 



i=l / i=l 

Next we apply the transformation ti = rf, i = 1, . . . ,n—l. Clearly ti = l^jp (we will deal with ti subse- 
quently to simpUfy notation). The corresponding Jacobian is J{ti, . . . ,tn-i) = l/(2"~-^-^ti, • • • 
So we obtain the desired joint p.d.f. of ti, . . . , tn-i as 

r(M\ ( ""^ \M-n 

/(».... .■t„^.) = r(Ml„ + i) ('-g'-j ■ 

Armed with this result, we can now evaluate the desired probabiUty /x„((5) in (61). For notational 
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convenience, we will consider iJ,n+i{S), for n + 1 G {2, • • • , M}. Denoting Dn = {0 < ti < 6, ■ ■ ■ ,0 < 
tn ^ we have 



Mn+i(5) = J ■J f ihr ■ ■ ,tn) dh- ■ -dtn 

r(M) 



r(M - n) 



'Pn{l) 



(70) 



where we have defined 



M-n-l 



i=l 



dti • • • dt„ 



(71) 



for z > nd. Note that with this definition, <^n(l) exists for all n provided that S < jjzii- This condition 
is assumed in the lemma statement. Then (pn{z) can be written as 

n \ M-n-l 

fniz) = I ■■■ I I / \ z-^ti\ di„ I dti • • • dtn-1 



M 



Dn-l \ Jo 



1=1 



n— 1 



M-n 



n—1 



M-n 



i=l 



i=l 



dti--- dtn-l 



M-n 



{(Pn-l{z) - ipn-liz - S)) . 



(72) 



So we have 



</'n(l) 



1 



M-n 



(</j„_i(l) - ¥?„-i(l - 6) ) 



1 



{M -n){M -n + l) 

X ((^n-2(l) - 2(/5„_2(l - ^) + (/'n-2(l - 25)). 

We will now prove, using mathematical induction, that for any integer fc G {1, 2, • • • , n — 1}, 

-1 



(73) 



(74) 



</'n(l) 



k-1 

Y[{M-n + j] 

'-j=o 



i=0 



(75) 



According to (73) and (74), (75) holds for = 1 and k = 2 respectively. Assuming that (75) holds for 
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integer k, applying (72) in (75) yields 



<yfn-fc-l(l -i 5)- - (i + 1) (5) 



k -| _i fc 

j=0 ^ i=0 

k ( 

JJ(M-n + j) ]<^„_fe_i(l) + (-l)'=+Vn-fc-i(l-(fe + l) <J) 
fe-1 



(76) 



i=0 
k 



-1 fc+1 



{11) 
(78) 

\ ?, / 

-j=0 -I 1=0 

where, to obtain (77), we have used (.^^J = (^~^) + (^^^l). Thus, from (78), if (75) holds for integer fc, 
it also holds for A: + 1. By induction, (75) then holds for any integer 1 < k < n. Setting A; = n — 1 in 
(75), 

-n-2 



¥'n(l) 



JJ(M-n + j) 

'-i=o 



n-l 



j=0 



The function (^1(1 — i5) can be evaluated as 



Jo 



M-1 

Substituting (80) into (79) yields a closed-form solution, which we simplify as follows: 

n-l 



r(M) ■ V ^ 

X ((1 - i<5)A^-i _ [1 _ (i + 

r(M-n) 

r(M) 

r(il/-n) 



j=0 

M-l 



r(M) ^\ k 



1=0 



(5^ 



Since [27] 



^ =0, l<n<Ar, 



(79) 



(80) 



(81) 



(82) 
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j:(1)i-irk^={-ifm, N>o, 

k=0 ^ ^ 



(83) 



we obtain = Ylt'n Ck') (-1)' [ELo il) (-1)'^'] Substituting into (70) yields (27). 

Appendix C 
Proof of Lemma 4 

Similar to the proof of Lemma 3, we assume = without loss of generahty. Then the numerator 
of (26) is given by 

Pr |vf qf P <l-x, |vf qf |2 < 5, • • • , |vf q^_,|^ < 6^ 
= Pr l^^l^ <l-x,\vi\'^ <S,-- - , \vn-if < . 



(84) 



Recalling that U = |f jp, i = 1, 2, • • • , n — 1, we can evaluate (84) using the joint p.d.f. f{ti, . . . , tn-i) 
given in (69) in Appendix B. For n = 2, we have 



/^(M - 1) (1 - h f-^ dti x<l-S 



Pr(|vfqfr <l-x,|vfqf|2<<5) = <^ 



/o'-^(M - 1) (1 - h)^-' dii 1 - ,5 < X < 1 (85) 
x>l 



Solving the integrals in (85) and combining the result with (27) and (26) leads to the explicit solution 
given in (31). For n > 2, the problem is much more difficult. In this case, using (69), we obtain 

/ n—l \ 



Ki=l 



= < 





r(M) 



X > 1 

X < 1 - (n - 1)5 



(86) 



/ , \ M-n 

• • • /t, (l - Er=i ti) dti • • • dtn-i l-{n-l)S<x<l 



r(M-n+l) 

with the integration region for the remaining multi-dimensional integral defined in the lemma statement. 
Combining (86) with (27) and (26) leads to (30). 
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Appendix D 
Proof of Lemma 5 



We can upper bound the c.d.f. (30), for n > 2, 1 — (n — 1)5 < a; < 1, as follows 

^ , , ^ r(M) 



where the second Une follows from (70). For n = 2, we have 



^"^^ - ^ " - — (1 - s)M-i — (^^^ 



which is exactly the right-hand side of (31). 

We can estabUsh the corresponding lower bound via 

r(M) 



r(M-n + l) finiS) 

n-l \ A^-" 

■ ti dti--- dtn-l 



ti>0,- ,tn-l>0 



r(M) 



r(M-n + l) ^niS) 
^ Ji-^(n- l,M-rz + l) 



(87) 



(89) 



where we have used the identity [27] J J ■ ■ ■ J dti • • • dtn = For n = 2, it is easily verified that 

ti>0,---,t„>0 

(89) is equal to (88). 



Appendix E 
Proof of Lemma 6 

Recalling that for uncorrelated Wishart matrices, the eigenvalues and their corresponding eigenvectors 
are independent, it follows that A^^max is independent of (3k{n), I5k{n), and /3jfc(n). Thus, the c.d.f.s 
of ik{n), 7fe(n), and %{n), can be derived as F^^n){x) = Fp{n){x/y)fm^iy)dy, F^^n){x) = 
/o°°'^/3(n)(^/y)/max(y)dy, and F^(„)(a;) = F^(^^){x/y)f^{y)dy respectively, where /max(-) is the 



30 



p.d.f. of the maximum eigenvalue of H^H^. Together with Lemma 5, it follows trivially that < 
F^(„-)(x) < where the equaUties hold for n = 2. 

What remains is to derive closed-form expressions for F;y(„)(x) and F^(n){x). First consider F;y(„)(a;). 
Recalling (32), and noting that for 1 — (n — 1)5 <x <\, can be re-expressed using (27) as 



k=n—l 



i=0 



(90) 



it follows using Lemma 1 that 

M-l 



F^{n) {x) 



n-l / ,x / ■ \k-^ P {N+M-'. 



J X 





fc 



r=l s=q—p 



1--) y^e-'-J'dy. (91) 

By applying the transformation = | along with some elementary algebraic manipulations, the remaining 
integral is evaluated as 

> s 

X \ y 



X 



^ y) e^y'^^ 



E (-1 



\k-3j.k-3-s-lJi-j 



[r(j - k + s + l,rx) - T - k + s + lj-^^ 



Substituting this expression into (91), we readily obtain the result (34). A closed-form expression for 
-?7(n) {x) can be obtained in a similar manner, and is omitted due to space limitations. 

Appendix F 

Asymptotic expansion of c.d.f.s of %{n) and 7fe(n) for large x 
First note that the tail behavior (large x) of F^^{x) is given by [15] 

-X M+N-2 

Then, the corresponding expansion for the term Fmax(f ) in both (34) and (35) follows immediately. In the 
following, we require a corresponding expansion for the remaining terms in (34) and (35). First consider 
(34). Since the remaining terms in this case involve the upper incomplete gamma function V(n,x), we 
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require an asymptotic expansion for r(n, x) at x ^ oo. Using the definition and integrating by parts, 
for large x we have r(n, x) = e~^x"~^[l + + +...]. Since i < 1, the terms that decay 

most slowly in the summation in (34) can be expressed as 



_|_ j-k+s _|_ {j-k+s){j-k+s-l) _j_ _ _ 



(93) 



where 

Ck = 

Using (82) we can obtain 



M-1 
k 



i=0 



i J \n — 1 



(94) 



r^") (-Ij^^-i/"*-!) =0, l<m<k, 



j=0 

k 



Y.[%-lf-^j^ = k\, k>l, (95) 



j=0 

from which it follows that in (93), Y!'j=o (J) (-l)^-^' ^^=^^^~j'J"'^^~''^ = for 1 < m < A; - 1, and also 
that E-=o {'M-lf-' ^-'^-^^"''"'^ = f . We then have 



\( _-[\k-jYt=iit±+f+l: 

M-l N+M-2 

Oi 

gx- yrj-K yrj. 

k=n-l s=q-p 

which upon substituting for Ck and applying some manipulations using (95) gives 



_ (M-l)!(n-l)! M+iv-n-i 
- (M-n)!(n-l)«-i"^'^+^-^' 

+ 0(e-^x^+^-"-2) . (97) 

From (92), we have /max(a;) = ''r{M)T{N) +Q(e~^a;^"^^~^)- Therefore ai,;v+M-2 = r(M)r(jV) • Together 
with (97) and (92), we have (36). By using a similar method, the terms that decay most slowly in the 
summation in (35) can be obtained. That result, used with (92), yields (37). 

Appendix G 
Proof of Lemma 8 

Recall that F^^^^^{x) < F^^^^^{x) < F^^^^^{x). For 77r(n)," e {2, • • • and large K, with (40), 

Pr{u„ - log log < 7^(„)} > Pr{u„ - log log < 7^(„)} > 1 - o(^j^^ . Similarly, with (41) 
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we have Pr{7^(„) <Xn + log log \/K} > Pr{7^(„) <Xn + log log \/K} >1-0 (^13^^ • Thus, 



Pr{un - log log \/K < 7^(„) <Xn + log log VK} 

For n = 1, the asymptotic distribution of 77r(n) h^s been characterized in [14]. Using that resuh, along 
with (98), the lemma follows upon noting that Cn{n) = Pln{n)- 

Appendix H 
Proof of Theorem 1 

TTdna r4«i^ wp pan nhtain PrI log2(l+P^^-ploglog V^) < log2(l+C^(>>)) < log2(l+«,>+ploglog ^) 1 

Usmg (4i) we can obtam Fr j log.biog/f] ^ logjpiogi^] ^ iog,[piogK] 

1 — O ( jj^^ I . Substituting (46) and (47) and letting if ^ oo, the left-hand side and right-hand side 



inequality within Pr{-} converge to the same value. Thus, \\mK-¥oo ^°\ol^[^\ogK] ~ ^ ^^^h probabiUty 1, 
and (48) holds. To establish (49), we employ the following upper bound on £{Rbc} derived in [16]: 

£{Rbc] < M log2 (1 + p(log K + 0(log log K))) . (99) 

From Lemma 8, we have Pr| log2(l + C7r(n)) > log2(l + ro„ - p log log \fK) | > 1 - O ^i^^^ • Thus, 

^{-Rbc} — ^{-RzFDPC-sus} 

< Mlog(l + p(logiv: + 0(loglogK))) 

M 



log if, 

X 5Z {l + Wn- ploglog \fK) 

0(log log if) 



n=l 

M 



iVl / 

5]log 1 + - 
n=l ^ 



+ Wn - ploglog Vif 



+ o(|^)MO(loglogif) 

^/loglo|K\ ^^^^^ 



log if 

where we have used log(l + x) x for x ^ 1, and x ^ y means \\m.K-^oo x/y = 1 
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Appendix I 
Proof of Theorem 2 



From [13], for small enough S, £>,r(n) > i+ejs) ' where e{S) = i_(^m-i)s - Using this result, together 
with (99) and (45), and following a similar method as in Appendix H, we have 

^{^Bc} — ^{^ZFBF-SUs} 

< M log (1 + p(log K + 0(log log K))) + iTi§) J j 

< M log (1 + p(log K + 0(log log K))) - 5^ (l - o(y^) ) loi 

n=l ^ V °g / / 



^ ^ ^ ' ^ ro^ - p log log V-ft^ 



1 + 



1 + e(5) 



J^log 1 + - 

n=l ^ ^ 



p(e((5)logK + 0(loglogi^)) ^^ _^^/^loglogi^ 



+ (tu„-ploglogV:^)E~o(-e(<^))'^ V log^ 

- ^^^^^ 
where we have used the fact that for small enough S, \e{S)\ < 1, thus jip^^jj = Yl'iZo ( ~ ^('^))*- 
can see that as long as e{S) ~ o(l), or equivalently (5 ~ o(l), whilst satisfying the conditions in (29), the 
difference will become zero as K ^ oo. However, obviously ZFBF-SUS with a smaller candidate set at 
each iteration (i.e., reduced \Un\) can not achieve more sum rate than ZFBF-SUS with a larger candidate 
set at each iteration. Thus, with larger 6, there will be more candidate users for each iteration and the 
average sum rate will increase, or at least maintain. So the condition 6 ~ o(l) can be ignored, thereby 
estabhshing (50). From (101), the difference in sum rate is at most 0{ ^°f^°^^ ). 
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